18 research outputs found
Models of Visual Attention in Deep Residual CNNs
Feature reuse from earlier layers in neural network hierarchies has been shown to improve the quality of features at a later stage - a concept known as residual learning. In this thesis, we learn effective residual learning methodologies infused with attention mechanisms to observe their effect on different tasks. To this end, we propose 3 architectures across medical image segmentation and 3D point cloud analysis. In FocusNet, we propose an attention based dual branch encoder decoder structure that learns an extremely efficient attention mechanism which achieves state of the art results on the ISIC 2017 skin cancer segmentation dataset. We propose a novel loss enhancement that improves the convergence of FocusNet, performing better than state-of-the-art loss functions such as tversky and focal loss. Evaluations of the architecture proposes two drawbacks which we fix in FocusNetAlpha. Our novel residual group attention block based network forms the backbone of this architecture, learning distinct features with sparse correlations, which is the key reason for its effectiveness. At the time of writing this thesis, FocusNetAlpha outperforms all state-of-the-art convolutional autoencoders with the least parameters and FLOPs compared to them, based on our experiments on the ISIC 2018, DRIVE retinal vessel segmentation and the cell nuclei segmentation dataset. We then shift our attention to 3D point cloud processing where we propose SAWNet, which combines global and local point embeddings infused with attention, to create a spatially aware embedding that outperforms both. We propose a novel method to learn a global feature aggregation for point clouds via a fully differential block that does not need a lot of trainable parameters and gives obvious performance boosts. SAWNet beats state-of-the-art results on ModelNet40 and ShapeNet part segmentation datasets
FocusNet: An attention-based Fully Convolutional Network for Medical Image Segmentation
We propose a novel technique to incorporate attention within convolutional
neural networks using feature maps generated by a separate convolutional
autoencoder. Our attention architecture is well suited for incorporation with
deep convolutional networks. We evaluate our model on benchmark segmentation
datasets in skin cancer segmentation and lung lesion segmentation. Results show
highly competitive performance when compared with U-Net and it's residual
variant
FocusNet++: Attentive Aggregated Transformations for Efficient and Accurate Medical Image Segmentation
We propose a new residual block for convolutional neural networks and
demonstrate its state-of-the-art performance in medical image segmentation. We
combine attention mechanisms with group convolutions to create our group
attention mechanism, which forms the fundamental building block of our network,
FocusNet++. We employ a hybrid loss based on balanced cross entropy, Tversky
loss and the adaptive logarithmic loss to enhance the performance along with
fast convergence. Our results show that FocusNet++ achieves state-of-the-art
results across various benchmark metrics for the ISIC 2018 melanoma
segmentation and the cell nuclei segmentation datasets with fewer parameters
and FLOPs.Comment: Published at ISBI 202
Penalizing small errors using an Adaptive Logarithmic Loss
Loss functions are error metrics that quantify the difference between a
prediction and its corresponding ground truth. Fundamentally, they define a
functional landscape for traversal by gradient descent. Although numerous loss
functions have been proposed to date in order to handle various machine
learning problems, little attention has been given to enhancing these functions
to better traverse the loss landscape. In this paper, we simultaneously and
significantly mitigate two prominent problems in medical image segmentation
namely: i) class imbalance between foreground and background pixels and ii)
poor loss function convergence. To this end, we propose an adaptive logarithmic
loss function. We compare this loss function with the existing state-of-the-art
on the ISIC 2018 dataset, the nuclei segmentation dataset as well as the DRIVE
retinal vessel segmentation dataset. We measure the performance of our
methodology on benchmark metrics and demonstrate state-of-the-art performance.
More generally, we show that our system can be used as a framework for better
training of deep neural networks
FatNet: feature-attentive network for 3D point cloud processing
The application of deep learning to 3D point clouds is challenging due to its lack of order. Inspired by the point embeddings of PointNet and the edge embeddings of DGCNNs, we propose three improvements to the task of point cloud analysis. First, we introduce a novel feature-attentive neural network layer, a FAT layer, that combines both global point-based features and local edge-based features in order to generate better embeddings. Second, we find that applying the same attention mechanism across two different forms of feature map aggregation, max pooling and average pooling, gives better performance than either alone. Third, we observe that residual feature reuse in this setting propagates information more effectively between the layers, and makes the network easier to train. Our architecture achieves state-of-the-art results on the task of point cloud classification, as demonstrated on the ModelNet40 dataset, and an extremely competitive performance on the ShapeNet part segmentation challenge
Optimizing Vision Transformers for Medical Image Segmentation
For medical image semantic segmentation (MISS), Vision Transformers have
emerged as strong alternatives to convolutional neural networks thanks to their
inherent ability to capture long-range correlations. However, existing research
uses off-the-shelf vision Transformer blocks based on linear projections and
feature processing which lack spatial and local context to refine organ
boundaries. Furthermore, Transformers do not generalize well on small medical
imaging datasets and rely on large-scale pre-training due to limited inductive
biases. To address these problems, we demonstrate the design of a compact and
accurate Transformer network for MISS, CS-Unet, which introduces convolutions
in a multi-stage design for hierarchically enhancing spatial and local modeling
ability of Transformers. This is mainly achieved by our well-designed
Convolutional Swin Transformer (CST) block which merges convolutions with
Multi-Head Self-Attention and Feed-Forward Networks for providing inherent
localized spatial context and inductive biases. Experiments demonstrate CS-Unet
without pre-training outperforms other counterparts by large margins on
multi-organ and cardiac datasets with fewer parameters and achieves
state-of-the-art performance. Our code is available at Github
Continuous Interaction With a Smart Speaker via Low-Dimensional Embeddings of Dynamic Hand Pose
This paper presents a new continuous interaction strategy with visual feedback of hand pose and mid-air gesture recognition and control for a smart music speaker, which utilizes only 2 video frames to recognize gestures. Frame-based hand pose features from MediaPipe Hands, containing 21 landmarks, are embedded into a 2 dimensional pose space by an autoencoder. The corresponding space for interaction with the music content is created by embedding high-dimensional music track profiles to a compatible two-dimensional embedding. A PointNet-based model is then applied to classify gestures which are used to control the device interaction or explore music spaces. By jointly optimising the autoencoder with the classifier, we manage to learn a more useful embedding space for discriminating gestures. We demonstrate the functionality of the system with experienced users selecting different musical moods by varying their hand pose
Survey: Leakage and Privacy at Inference Time
Leakage of data from publicly available Machine Learning (ML) models is an
area of growing significance as commercial and government applications of ML
can draw on multiple sources of data, potentially including users' and clients'
sensitive data. We provide a comprehensive survey of contemporary advances on
several fronts, covering involuntary data leakage which is natural to ML
models, potential malevolent leakage which is caused by privacy attacks, and
currently available defence mechanisms. We focus on inference-time leakage, as
the most likely scenario for publicly available models. We first discuss what
leakage is in the context of different data, tasks, and model architectures. We
then propose a taxonomy across involuntary and malevolent leakage, available
defences, followed by the currently available assessment metrics and
applications. We conclude with outstanding challenges and open questions,
outlining some promising directions for future research